Introduction

Most scientific careers are forged through the accumulation of papers and citations (Freeling et al., 2019). Not without controversy (Aksnes et al., 2019; Davies et al., 2021; McNutt, 2014; Todd & Ladle, 2008), metrics such as the number of publications and the number of citations remain the agreed-upon measures of scientific success. For example, in our country (Italy), the academic habilitation for Associate and Full Professor is essentially based on three criteria: number of citations, H-Index, and total number of publications. This evaluation of careers by numbers generates pressure upon academics to constantly publish high-impact papers (Smaldino & McElreath, 2016). As a result, the ability to publish and attract citations is becoming an essential academic skill.

Ideally, the impact of a scientific publication should depend solely on the quality of the science reported therein. In scientometric studies, scientific quality is typically captured by analysing factors such as the importance of the topic, level of evidence, novelty, study design, and methodology (Tahamtan & Bornmann, 2019; Tahamtan et al., 2016; Xie et al., 2022). However, evidence is accumulating that several non-scientific features contribute to the short- to long-term reach and impact of any given paper, including stylistic choices (Heard et al., 2022; Letchford et al., 2016; Martínez & Mammola, 2021; Murphy et al., 2019), number of authors (Fox et al., 2016), biases in authors’ language and gender (Andersen et al., 2019; Fu & Hughey, 2019), and availability as a preprint (Fu & Hughey, 2019). By capitalizing on the conclusions of similar analyses, researchers may be tempted to create their own checklist of dos and don'ts to maximise publication impact.

In recent years, there have been a number of seminal literature reviews that have attempted to organise this constantly growing body of literature in a digestible way (Bornmann & Daniel, 2008; Tahamtan & Bornmann, 2019; Tahamtan et al., 2016). Furthermore, there are existing meta-analyses that have looked at the influence of single features on citations, such as paper length (Xie et al., 2019) or level of collaboration (Shen et al., 2021). However, the picture is still rather crude, insofar as the effects of these non-scientific features are often weak, and their directions are likely to vary across studies, disciplines, and datasets. Two important questions naturally arise. First, can we reach a quantitative consensus on what are the most important non-scientific features that correlate with citation counts? Second, and perhaps most importantly, to what extent should we strive to achieve scientific impact by focusing our attention on these non-scientific shortcuts?

We examined these two complementary questions by undertaking a meta-analysis across the scientometric literature, designed to build a quantitative understanding of the effect of non-scientific features on citations of scientific articles. We focus on citation count as this is amongst the commonest metrics by which scientific outputs are measured, and it is also the most common proxy analysed in the scientometric literature (Fig. 1a). After a systematic literature search in the Web of Science (Supplementary material Table S1), we identified 262 publications testing for the importance of non-scientific features on the impact of scientific papers. We assigned each non-scientific feature to one of twenty-five major response categories across five broader general groups: Writing features and style, Graphical elements, Pre/post-publication practices, Authority, and Authority bias (Fig. 1b). We first looked at the consensus across our database on the impact of non-scientific features on the number of citations. As the entire dataset covered a wide range of disciplines and temporal spans, we also tested whether there has been temporal variation in the different effects and whether these effects varied across disciplines. If citations are to be used as a measure of scientific quality, then the influence of non-scientific article features on citation counts needs to be minimised. A further objective of this study was thus to develop a method by which such influences can be assessed and monitored.

Fig. 1
figure 1

Summary of the study design and the sampled literature. a Response variables, namely the measures of article impact (note that due to reduced sample size, only citations were analysed). b Predictor variables, namely all the non-scientific features that may affect the response. Colour coding refers to the grouping of variables in five broad categories. Only those features considered in more than four articles were analysed. Sample size in (b) are relative to the dependent variable citation. Note that the variable “Others” was not analysed despite being featured in more than 5 articles. (Color figure online)

Methods

Literature search

We conducted a systematic review of science of science literature investigating the effect of non-scientific features on article impacts using the Web of Science Core Collection database over all citation indices, all document types, all years, and all languages (queries were made between 01 and 03 July 2020). An overview of the plan for the meta-analysis is reported in Supplementary material Box S1 and a PRISMA diagram (Moher et al., 2009; Page et al., 2021) is given in Supplementary material Fig. S1. Based on background knowledge of factors that may affect an article’s impact (Tahamtan et al., 2016), we conducted separate queries for different non-scientific features relating to Title, Abstract, Keywords, Main text, Figures, Author-list, Bibliography, Publication strategy, and Post-publication strategy (Supplementary material Table S1; these non-scientific features were subsequently grouped in 25 major response categories across 5 broader general groups as in Fig. 1b).

We initially screened titles and abstracts to exclude clearly inappropriate references. In addition, we applied a series of stricter exclusion criteria (Supplementary material Box S1), eliminating: (i) purely descriptive papers (e.g., Editorial and Opinion pieces); (ii) research articles not quantifying article impact numerically; (iii) studies focusing solely on variables related to the scientific content (e.g., importance of the topic, novelty, study design, methodology); and (iv) articles that concerned trends over time with no reference to article impact/quality. Tests of inter-rater agreement between authors with Cohen’s kappa (Cohen, 1960) showed a good level of repeatability of study selection using the specified criteria for all searches (Supplementary material Table S1). Further details of the search procedure and the repeatability analysis are given in Supplementary text.

Following the screening phase, we extracted predictors and response variables for each paper, as well as relevant meta-data (discipline, sample size, number of journals analysed, minimum, maximum, and range of publication years considered). At this stage, we identified several studies that could not be used for the final analysis due to inadequate presentation of results. However, we contacted corresponding authors of these studies asking for missing information, which yielded further usable data from 19 studies (response rate = 58.5%).

Effect size calculation

We used only the number of citations as our dependent variable as this is the main currency for assessing perceived quality of scientific literature (Aksnes et al., 2019). All other response variables lacked a sufficient sample size (Fig. 1a).

We converted test statistics that described the link between citations and non-scientific features of papers to Pearson’s r using standard conversion formulas (Lajeunesse, 2013). Pearson’s r expresses the effect size, i.e. the strength of a given linear association between the number of citations and non-scientific features. It ranges continuously between − 1.0 and 1.0, where positive values indicate a positive effect of the feature on citations, values close to 0 no significant association, and negative values negative effects. We calculated Pearson’s r for any comparison within a study, such as when different non-scientific features or different levels for factors were analysed by the authors.

Meta-analysis

We conducted analyses in R version 4.0.3 (R Core Team, 2021), using the ‘metafor’ package version 2.4.0 (Viechtbauer, 2010). To approximate normality, we converted Pearson’s r to Fisher’s z (Rosenberg et al., 2013). However, as in Chamberlain et al. (2020), for the presentation of the results, we back-transformed Fisher’s z values and their 95% confidence intervals to Pearson’s r to ease visualisation. We interpreted model-derived estimates of Pearson’s r as the strength of the standardized effect, which was considered significant when the 95% confidence intervals did not overlap zero.

In the first set of meta-analytic linear mixed-effects models (‘metafor’ function rma.mv), we assessed the extent to which different non-scientific features affect the number of citations across the whole sample. In all models, we specified a publication-level nesting factor to account for study-level non-independence due to multiple measurements per study (mean ± s.d. = 9.4 ± 34.1 estimates/publication). Many journals have fixed formats (e.g., abstract length, number of keywords). Some of our results may have arisen because journals that have particular formats, especially in terms of writing features and style, also happen to be cited more. To verify the robustness of our approach, we therefore repeated the first analysis on a subset of scientometric papers that only focused on a single journal. This ensured that the estimates derived from this subset of papers were controlled for journal-specific features such as Impact Factor, ranking, field or international visibility.

Subsequently, we tested the influence of two moderators (that is, variables that condition the effect size in meta-analyses) on the citation effect. We explored temporal effects by running a set of models that included time as a moderator for each predictor variable, aiming to establish whether the direction of the effect was moving towards zero (namely, toward a lack of effect) or away from zero (increased positive or negative effects) in recent years. To this end, we included the minimum year of the database considered in the publication as a moderator in each model. This way, we could see if the effect is different between analyses based on older versus recent samples of literature. Note that we also repeated the analysis using maximum year of publication instead of minimum year of publication, and results were similar to those using the minimum year (Supplementary material Fig. S3). The direction of the effect was based on the direction of the estimate for each moderator. We considered cases where the effect of year was significant as evidence of a significant temporal effect (bold arrows in Fig. 2a). As a general indication, we also reported the direction for non-significant effects.

Fig. 2
figure 2

Results of the meta-analysis. Estimates of the effect size of non-scientific features on article citations, expressed as standardized Pearson’s r ± 95% confidence limits. Sample sizes are given in parentheses (number of standardized estimates, number of studies). a The overall effect is the strength of the effect without moderators. Arrows on the right indicate the direction of the temporal effect, based on a second model which included the minimum of the publications considered in a given study as a moderator. Bold arrows denote significant temporal trends (p < .001). Colour coding as in Fig. 1b. Model estimates and exact p-values are given in Table 1. b The overall effect is the strength of the effect including discipline as a moderator. Model estimates are given in Table 2

The second moderator we tested was discipline, given that the effect of non-scientific features on citations may vary in strength and direction across scientific domains (Tahamtan et al., 2016). As discipline is a categorical variable, we constructed hierarchical models that tested for differences between the levels of discipline within a given model (between-group heterogeneity). We considered cases where between-group heterogeneity was significant as evidence of a moderator effect. To balance the sample size across levels, we grouped disciplines into four levels: (i) Mathematical sciences (Engineering, Informatics, Mathematics, Physics; N = 718 estimates); (ii) Medicine (N = 718); (iii) Natural sciences (all biological disciplines and chemistry; N = 539); and (iv) Soft sciences (all humanistic disciplines, including Law, Politics, and Economy; N = 317). In all of these models, we excluded estimates deriving from articles that did not make a distinction across disciplines (N = 204).

Publication bias

Publication bias arises when studies finding negative results are less likely to be published than those finding supportive evidence—a phenomenon also termed the “file-drawer effect” because negative results are imagined to rest, unread, in scientists’ drawers. We evaluated publication bias via fail-safe number analysis, as implemented in the ‘metafor’ function fsn. Specifically, we used Rosenthal’s method (Rosenberg, 2005; Rosenthal, 1979) to calculate the number of studies averaging negative results that would have to be added to the given set of observed outcomes (predictors) to reduce the combined significance level to a target alpha level of 0.05. According to the fail-safe N analysis (Table 1), there was no evidence of publication bias in any model except for the Number of keywords and Title pleasantness, which therefore must be interpreted with caution.

Table 1 Estimated model parameters (full model and model with year as moderator)

Results and discussion

Influence of non-scientific features on citations

We identified 2,312 studies in the initial literature search (Supplementary material—Appendix S2), of which 997 were deemed relevant for testing for the effect of non-scientific features on article impact. A subset of 262 studies satisfied the necessary inclusion criteria for the meta-analysis (Supplementary material—Fig. S1). Number of citations was the sole response variable with sufficient sample size, namely 250 articles and 2,361 unique standardized estimates (Fig. 1a).

Overall, effects of non-scientific features on article citations were consistently weak—all standardized Pearson’s r <  ± 0.2 (Fig. 2a). This is in line with the general understanding that variables related to the scientific content (e.g., importance of the topic, level of evidence, novelty, study design, methodology) rather than non-scientific features, are the most important factors capturing variance in citation outcomes (Tahamtan et al., 2016). However, several effects were highly significant, suggesting that many non-scientific features can contribute to inflating an article’s citation rate.

There is an implicit assumption that the variables we have analysed are not genuine measures of scientific quality, but it could be argued that a few are. In other words, some of the factors considered here may sometimes be plausible surrogates of ‘quality’ (e.g., author experience, collaboration measures, Impact Factor), whereas most others are just artefacts of the publishing and citation system (e.g., reference lists, article length, number of keywords, self-citations).

Among the non-scientific features that might be at least partially related to scientific quality, those referring to journal impact and author experience had the strongest positive effects. Publishing in prestigious journals with higher Impact Factors gives a citation advantage, possibly due to greater visibility and perceived trust in the science appearing in top-tier journals (Callaham et al., 2002). Furthermore, papers with more authors, with more experienced authors and with authors having broader collaborative networks tended to be more highly cited. This finding matches the results of an independent meta-analysis (Shen et al., 2021), and may be explained by multiple factors. On the one hand, the quality of research is likely to be higher when multi-disciplinary teams and experienced authors are working together (Cardoso et al., 2021; Falkenberg & Tubb, 2017; James Jacob, 2015). On the other hand, authors that are famous, highly-cited and/or more advanced in their career may achieve more citations due to their prestige and perceived credibility in the field (Tahamtan et al., 2016). Furthermore, higher citation counts may simply be the result of a greater visibility arising from a multitude of co-authors and their respective networks (Bosquet & Combes, 2013).

Further positive effects were associated with different features of reference lists, including the reference list length, the overall impact of the literature cited in a given paper and the number of self-citations. The positive effect of self-citations on the number of citations suggests that excessive self-citations may be used by some authors as a way to increase their visibility and boost their citation metrics (Fowler & Aksnes, 2007). However, self-citations are also an integral part of scientific progress, reflecting the cumulative nature of individual research (Glänzel & Thijs, 2004; Ioannidis, 2015; Mishra et al., 2018; Penders, 2018). Papers with a high number of self-citations that are part of a long-term research line are often more visible and citable, and this may lead to accumulating more citations (Mammola et al., 2021) (Table 2).

Table 2 Estimated model parameters (model with discipline as moderator)

Additional positive significant effects refer to factors that are likely to be mostly structural, including article length and number of graphical items. These results match those of previous studies (Tahamtan & Bornmann, 2019; Tahamtan et al., 2016; Xie et al., 2019). This effect may arise because longer papers that cite more articles and that include more figures may be addressing a greater diversity of ideas and topics (Ball, 2008; Elgendi, 2019; Fox et al., 2016). Indeed, there is evidence that, in recent years, individual papers are becoming more densely packed with information and thus may contain more citable information (Cordero et al., 2016). Furthermore, longer reference lists may make papers more visible in online searches, while also attracting tit-for-tat citations, that is, the tendency of cited authors to cite the papers that cited them (Mammola et al., 2021).

No other analysed predictors exerted significant control on citations (Fig. 2a). Among others, the lack of effect of gender may come as a surprise, given that the discourse on gender biases is timely (Abramo et al., 2021; AlShebli et al., 2020; Casad et al., 2021; Davies et al., 2021; Holman & Morandin, 2019; Kwon, 2022). Note that the effect was in the expected direction (negative), but only weakly significant as the confident interval overlapped zero (Fig. 2a). While the existence of gender inequality within Academia is undeniable, this bias may not be best captured by citations—but see Dworkin et al. (2020) for larger effects in Neuroscience. For example, a recent analysis focused on more than 1 million medicine papers published between 2008 and 2014 showed that differences in citation distributions between males and females are very small and mostly attributable to journal prestige and self-citations (Andersen et al., 2019). Indeed, once a paper is published, it is unlikely that it will be less cited based on gender considerations, in part because only the surname of authors is available in most reference managers. Conversely, gender bias in publication outcomes may be best captured by other response variables, especially those related to peer-review success (Fox & Paine, 2019) or Impact Factor (Barrios et al., 2013; Holman & Morandin, 2019), which we could not analyse due to low sample size (Fig. 1a).

Repeating the analysis for studies that considered only a single journal revealed largely consistent patterns with the main analysis (cf. Fig. 2 and Supplementary material Fig. S2), thus indicating that the observed effects operated within journals as well as between them, i.e. the observed effects were unlikely to have been due to particular formats being associated with journals that had higher citations.

Temporal trends

There was a significant temporal effect for journal Impact Factor, whereby the positive effect of journal impact on citations is becoming stronger over time (Fig. 2a). There was also a significant temporal effect of the number of authors and level of collaboration, with more authors and broader collaborative networks exerting a stronger influence on citations in recent years. Science is indeed becoming more and more collaborative and trans-disciplinary (Knapp et al., 2015; Sahneh et al., 2021), a trend made possible by gigantic technological advances in communication enabling effective collaboration among multiple authors and large consortia—e.g., in medicine (International Human Genome Sequencing consortium, 2004) and physics (Castelvecchi, 2015).

Other significant temporal effects pertained to the decreasing influence of reference list features on citations in recent years. The fact that reference list features are less closely associated with citations may imply better referencing practice by authors in recent years, a behaviour possibly facilitated by increasing awareness of the importance of responsible referencing (Kwon, 2022; Penders, 2018), but also to a greater availability of performing technologies to browse literature through the Internet (Gingras et al., 2009; Mammola et al., 2021).

Discipline-specific effects

The above patterns were consistent across most disciplines, with a few exceptions (Fig. 2b, Table 2). Natural sciences largely deviated with respect to open access, journal Impact Factor and online sharing. The availability of funding positively influenced citations in medicine, a field that notoriously receives high financial support (Murphy & Topel, 2010). Furthermore, longer papers were associated with more citations in medicine and natural science, whereas more concise papers had a citation advantage in mathematical sciences and soft sciences.

A recipe for success?

Our analysis indicates that patterns exist in the data regarding the effect of non-scientific factors on how often a particular paper is cited. Insofar as citation counts are widely used for measuring academic performance, one might be tempted to use this knowledge to game the system of citation counts when writing a paper. However, this is likely to be problematic. If citations are determined by a number of technical factors that are unrelated to scientific quality, evaluating the impact of a paper solely based on this blunt metric entails a substantial intrinsic bias—scientific quality is necessarily a multifaceted concept (Polany et al., 1962). This is why a modern discourse on this subject focuses on exploring new ways to express scientific quality, for example by decomposing it into fundamental components such as Solidity and Plausibility, Originality and Novelty, and Scientific Value (Aksnes et al., 2019).

An additional cause of concern is the existence of temporal trends towards an increase of some of these effects over time. This ‘rise and fall’ of some non-scientific features implies that the number of citations is influenced by factors that are deemed as important by the scientific community in some periods, but not in others. Ideally, we would want the effects in Fig. 2a to move towards the middle (a zero effect), so that factors unrelated to scientific quality can no longer be used to boost citations. However, we found that about half of the factors analysed here are shifting away from zero (that is, they are becoming more influential over time), further strengthening the idea that the number of citations alone is not enough to evaluate the scientific quality of a paper.

As a final corollary note, it is important to mention that publishing behaviours of authors are often modulated by journals and institutions. For example, most journal guidelines limit the abstract and article length, number of keywords, number of references, and rarely even the number of authors (e.g., in Trends in Ecology and Evolution). Furthermore, assessment agencies and institutions are increasingly using Altmetrics and citations as measures of impact. All these constraints feedback to affect the behaviour of authors, and vice versa.

Conclusion

In conclusion, let us ask it one final time “… to achieve scientific impact, should we ‘game the system’ and exploit these non-scientific features to our advantage?” Although our meta-analysis pointed out that some non-scientific features can be used to increase the probability of being cited, the temporal variability of their relevance jeopardizes any “recipes for success” based on their use. Consequently, if we want citations to be used as a measure of scientific quality, the answer is “no”, but the fact that citations remain among the most important metric for academics, especially early career researchers, is of sufficient magnitude to warrant constant reflection on our publication practices. To this end, our method can act as a benchmark to monitor the influence of these features over time. Ideally, to adopt citation counts as a measure of scientific quality, effect sizes in Fig. 2 should be zero for most of themes addressed. This can be achieved by monitoring new literature being published on the subject to see how these effects vary in the years ahead.